Student Team:
YES
1)We used a slight variation of Geotools toolkit for
visualizing geospacial data(gps.csv) thereby visualizing the paths in which the
cars are moving .
Source : http://www.geotools.org/
2)We also used QGIS tool to locate and label the necessary placemarks that were
mentioned in loyalty/credit card transactions .
Source : https://www.qgis.org/en/site/forusers/download.html
3)We used D3.js for building the data visualization frameworks that helped to
depict the unusual patterns in data .
Source : http://d3js.org/
Approximately how many hours were spent
working on this submission in total?
150 hours
May we post your submission in the
Visual Analytics Benchmark Repository after VAST Challenge 2014 is complete? YES
Video:
https://www.youtube.com/watch?v=0KMLvbnLQmI
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Questions
MC2.1 Describe common daily routines
for GAStech employees . What does a day in the life
of a typical GAStech employee look like?Please limit your response to no more than five images
and 300 words .
We have developed
an interface which uses geotools to show the paths in which cars are moving at
a particular time or date as shown in Fig1.1 (Interface can be seen by clicking
on the image) .
Color |
Frequency |
Very High
frequency |
|
High
Frequency |
|
low
frequency |
|
very low
frequency |
Figure 1.1
:Interface showing car geospacial patterns [Click on the image to view the
interface]
Colors over shape
file(yellow colored lines over the tourist map of Abila) in the genrated images
depict the frequency of cars at that paticular geospacial location . Table
below shows the patterns after analysing the paths in which cars are moving in
various hours for all days .
Time |
Frequency |
00 hrs to 06 hrs |
very less frequency throughout |
07 hrs to 09 hrs |
high frequency near GAS Tech |
09 hrs to 11 hrs |
less frequency throughout |
11 hrs to 12 hrs |
high frequency near GAS Tech |
12 hrs to 14 hrs |
high frequency near GASTech and also in the paths joining
GASTech and important placemarks as shown in Fig1.2 |
14 hrs to 15 hrs |
high frequency near GAS Tech |
15 hrs to 24 hrs |
less frequency throughout |
Figure 1.2
:Red Colored patterns showing high frequency paths
So , by this we can
conclude that the working hours of the employees in GeoTech are 7am to 3pm .
Similarly , if we analyse the data considering various days of a week
except for weekends (Sat & Sun) ,we see that most of the employees follow a
particular pattern in their paths as shown in Fig 1.3.So,we understand that
employees frequently visit some places from the workplace . They include
Barwyn Street ,Jacks Magic Beans ,Abila Airport ,Guys Gyros , etc
Figure 1.3
:Image showing patterns showing for weekdays
Fig 1.4 shows the
graph for number of transactions across various days of a week . By analysing
the credit card/loyalty card transactions ,we can understand that the number of
transactions are relatively less(
Figure 1.4
:Number of transactions for various days
Fig 1.5 shows an
interface (similar to that in Fig1.1) which helps in analysing data specific to
a particular employee type . Here we have included the locations not only from
the gps data but also from loyalty/credit card data and from the figure we can
say employee specific frequent locations are as follows
Employee Type |
Locations |
Engineering |
Hippookampos ,Been there Done that |
Executive |
Hippookampos , Jack's Magical Beans,Brewed Awakenings |
Information Technology |
Hallowed Grounds,Ouzeri Elian |
Facilities |
Abila Airport,Carlyle Chemical Inc,Nationwide Refinery |
Security |
Many places (Guys Gyros relatively more times) |
Figure 1.5
:Interface showing Employee type specific car geospacial patterns [Click on the
image to view the interface]
MC2.2 Identify up to twelve unusual
events or patterns that you see in the data . If you identify more than twelve
patterns during your analysis, focus your answer on the patterns you consider
to be most important for further investigation to help find the missing staff
members . For each pattern or event you identify, describe
1.
What is
the pattern or event you observe?
2.
Who is
involved?
3.
What
locations are involved?
4.
When
does the pattern or event take place?
5.
Why is
this pattern or event significant?
6.
What is
your level of confidence about this pattern or event?Why?
Please limit your
answer to no more than twelve images and 1500 words.
We tried to merge
gps locations and the locations in credit/loyalty card transactions and when we
did so we found a few instances where the same person was at 2 different
locations at the same time (i.e. location in loyalty/credit card transaction is
different from that in gps) . Such a thing can only happen when credit card
or/and car are used by some other person and not the employee and so such
instances can be considered as unusual patterns . We show some such patterns
below . For visualizing this data we used qgis tool .
First , lets consider instances which were supposed to be at Kronos
Mart(according to credit card data) but were not .Fig 2.1 shows the difference
in locations .
Figure
2.1:Unusual pattern 1
Now lets consider
instances which were supposed to be at Been There Done That (according to
credit card data) . Fig 2.2 shows the same .
Figure
2.2:Unusual patterns 2,3
Lastly we have
instances which were supposed to be at Jacks Magical Beans . Fig 2.3 shows the
same .
Figure
2.3:Unusual patterns 4,5
By considering only
credit amount in transactions we see that employees who belonged to Facilities
had total transaction amount near to 20,000 . Exculding them , rest of the
employees had amounts less than 5000 with Lucas Alcazar as an expection whose
value is much more (10,584) as shown in Fig 2.4 .
Fig 2.4 :
Graph showing total trasaction amounts per person
If we furthur look
into all trasactions of Lucus Alcazar(Fig 2.5) , we see that the variation is
because of a single instance and so we can consider that as an unusual event .
Fig 2.5 :
Graph showing all transactions of Lucus Alcazar (Unusual Event 6)
Fig 2.6 :
Graph showing standard deviation in amounts accross various locations (Click on
image to view specific values )
We found standard deviations of amounts considering transactions of every
location separately .The locations having high standard deviations would mean
that there are some unusual transactions (abnormally high or low amounts) in
that location . Here the places having high standard deviations are Maximum
Iron and Steel, Nationwide Refinery Kronos Pipe and Irrigation, Abila Airport
,Stewart and Sons Fabrication, Carlyle Chemical Inc. ,Abila Scrapyard and
Frydos Autosupply n' More
By analysing transactions specific to these locations we found 3 unusual events
.
Fig
2.7:Graph showing transactions in Abila Airport(Unusual Event 7)
Fig 2.8
:Graph showing transactions in Carle Chemical(Unusual Event 8)
Fig
2.9:Graph showing transactions in Stewart and Sons Fabrication(Unusual Event 9)
We have noticed
some events where a person of specific employee type has visited some places
which none or very few employees of that type or any other type have visited.
Fig 2.10 :
Graph showing number of times an employee has gone to some specific place
Fig 2.11
:Graph specifying on less frequntly visited locations wrt that in Fig 2.10
(Unusual Events 10,11,12)
MC2.3 Like most
datasets, the data you were provided is imperfect, with possible issues such as
missing data, conflicting data, data of varying resolutions, outliers, or other
kinds of confusing data. Considering
MC2 data is primarily spatiotemporal, describe how you
identified and addressed the uncertainties and conflicts inherent in this data
to reach your conclusions in questions MC2.1 and MC2.2.Please limit your
response to no more than five images and 300 words.
Heading1
Missing Data :
Most of the employees whose transaction amounts were analysed were assigned to
either of the employee types (in car assignments file), but a few wern't
assigned .Fig 3.1 shows the classification of employees . So daily routines
specific to an employee type were concluded assuming just the classified
employees.
Figure 3.1
:Classification of employees to employee types
Same was the case
with carid's in gps data.Most were assigned to some employee but a few wern't
.Fig 3.2 shows that. Here also daily routines specific to employee type were
found considering only the classified ones.
Figure 3.2
:Classification of carids
Conflicting Data :
When we tried to merge loyalty card data and credit card data for preprocessing
, we found that the amount in loyalty card and credit card differed by 1 or 2
digits in some instances . Fig 3.3 shows the difference in amounts between
loyalty and credit card trasactions.
Figure 3.3
:Difference in amounts between loyalty and credit card transactions
We resolved that by
assuming that the amount in credit card is reliable,and thus changed values in
loyalty card transactions accordingly.